Last Update: 2025/3/26
LLMVision Audio Transcription API
The LLMVision Clone Voice API allows users to generate speech that closely resembles a provided voice sample.
Endpoint
POST https://platform.llmprovider.ai/v1/audio/transcriptions
Request Headers
Header | Value |
---|---|
Authorization | Bearer YOUR_API_KEY |
Content-Type | multipart/form-data |
Request Body
Parameter | Type | Description |
---|---|---|
file | file | The audio file object (not file name) to transcribe, in one of these formats: flac , mp3 , mp4 , mpeg , mpga , m4a , ogg , wav , or webm . file maxsize <= 20M |
model | string | ID of the model to use (e.g., whisper-1 ). |
prompt | string | (Optional) Text to guide the model's style or continue a previous audio segment. |
response_format | string | (Optional) The format of the transcript output (json , text , srt , verbose_json , or vtt ). Default is json . |
temperature | number | (Optional) The sampling temperature, between 0 and 1. Default is 0. |
language | string | (Optional) The language of the input audio (e.g., en , es , fr ). |
timestamp_granularities[] | array | (Optional) The timestamp granularities to populate for this transcription. |
Response Body
The transcription object
or a verbose transcription object
.
The transcription object(JSON)
Parameter | Type | Description |
---|---|---|
text | string | The transcribed text. |
{
"text": "Hello, this is the transcribed text from the audio file."
}
The transcription object (Verbose JSON)
Parameter | Type | Description |
---|---|---|
task | string | The task performed by the model. |
language | string | The language of the input audio. |
duration | number | The duration of the audio in seconds. |
segments | array | Segments of the transcribed text and their corresponding details. |
text | string | The transcribed text. |
words | array | Extracted words and their corresponding timestamps. |
{
"task": "transcribe",
"language": "en",
"duration": 2.95,
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 2.95,
"text": "Hello, this is the transcribed text from the audio file.",
"tokens": [
50364,
2425,
11,
359,
307,
1161,
1123,
422,
264,
1467,
1780
],
"temperature": 0.0,
"avg_logprob": -0.458,
"compression_ratio": 0.688,
"no_speech_prob": 0.0192
}
],
"text": "Hello, this is the transcribed text from the audio file."
}
Example Request
- Shell
- nodejs
- python
curl -X POST https://platform.llmprovider.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $YOUR_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@audio.mp3" \
-F model="lmp-stt-20241013"
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');
const formData = new FormData();
formData.append('file', fs.createReadStream('audio.mp3'));
formData.append('model', 'lmp-stt-20241013');
axios.post('https://platform.llmprovider.ai/v1/audio/transcriptions', formData, {
headers: {
'Authorization': `Bearer ${YOUR_API_KEY}`,
...formData.getHeaders()
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error('Error:', error);
});
import requests
audio_file = open("audio.mp3", "rb")
files = {
"file": audio_file
}
headers = {
"Authorization": f"Bearer {YOUR_API_KEY}"
}
response = requests.post(
"https://platform.llmprovider.ai/v1/audio/transcriptions",
headers=headers,
files=files,
data={
"model": "lmp-stt-20241013"
}
)
print(response.json())
For any questions or further assistance, please contact us at [email protected].